智能论文笔记

Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness

Haoye Tian , Xunzhu Tang , Andrew Habib , Shangwen Wang , Kui Liu , Xin Xia , Jacques Klein , Tegawendé F. Bissyandé

分类：人工智能

2022-08-08

在这项工作中，我们提出了一个新颖的观点，以解决贴片正确性评估的问题：正确的贴片实现了“答案”对越野车行为提出的问题的变化。具体而言，我们将贴片正确性评估变成一个问题回答问题。为了解决这个问题，我们的直觉是，自然语言处理可以提供必要的表示和模型来评估错误（问题）和补丁（答案）之间的语义相关性。具体而言，我们认为是输入错误报告以及生成的补丁的自然语言描述。我们的方法，Quatrain，首先考虑了最先进的消息生成模型，以生成与每个生成的补丁相关的相关输入。然后，我们利用神经网络体系结构来学习错误报告和提交消息之间的语义相关性。针对三个错误数据集生成的9135个补丁的大数据集（缺陷4J，Bugs.s.s.jar和Bears）的实验表明，Quatrain可以在预测补丁的正确性时达到0.886的AUC，并在过滤62％的62％错误的补丁时召回93％正确的补丁。我们的实验结果进一步证明了投入质量对预测性能的影响。我们进一步执行实验，以强调该模型确实了解了错误报告与预测的代码更改描述之间的关系。最后，我们与先前的工作进行比较，并讨论我们方法的好处。

translated by 谷歌翻译

MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning

Weiguo Pian , Hanyu Peng , Xunzhu Tang , Tiezhu Sun , Haoye Tian , Andrew Habib , Jacques Klein , Tegawendé F. Bissyandé

分类：人工智能

2022-06-13

源代码的表示学习对于将机器学习应用于软件工程任务至关重要。已经显示，跨不同编程语言的学习代码表示比从单语言数据集中学习更有效，因为来自多语言数据集的更多培训数据可提高该模型从源代码中提取语言 - 不平衡信息的能力。但是，现有的多语言模型忽略了特定于语言的信息，这对于在多语言数据集中培训的下游任务至关重要，同时仅着眼于学习不同语言之间的共享参数。为了解决这个问题，我们提出了MetatPtrans，这是一种用于多语言代码表示学习的元学习方法。 MetAtPtrans根据输入源代码段的特定编程语言为特征提取器生成不同的参数，从而使模型能够同时学习语言 - 语言和特定于语言的信息。实验结果表明，MetAtPtrans可将最新方法的F1得分显着提高到2.40个百分点，以汇总代码摘要，这是一项语言不可或缺的任务；以及TOP-1（TOP-5）的预测准确性高达7.32（13.15）百分点，以完成代码完成，这是一项特定于语言的任务。

translated by 谷歌翻译

Conservation Tools: The Next Generation of Engineering--Biology Collaborations

Andrew Schulz , Cassie Shriver , Suzanne Stathatos , Benjamin Seleb , Emily Weigel , Young-Hui Chang , M. Saad Bhamla , David Hu , Joseph R. Mendelson III , .

分类：机器学习

2023-01-03

The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.

translated by 谷歌翻译

Through-life Monitoring of Resource-constrained Systems and Fleets

Felipe Montana , Adam Hartwell , Will Jacobs , Visakan Kadirkamanathan , Andrew R Mills , Tom Clark

分类：机器学习

2023-01-03

A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

Relevance Classification of Flood-related Twitter Posts via Multiple Transformers

Wisal Mukhtiar , Waliiya Rizwan , Aneela Habib , Yasir Saleem Afridi , Laiq Hasan , Kashif Ahmad

分类：自然语言处理

2023-01-01

In recent years, social media has been widely explored as a potential source of communication and information in disasters and emergency situations. Several interesting works and case studies of disaster analytics exploring different aspects of natural disasters have been already conducted. Along with the great potential, disaster analytics comes with several challenges mainly due to the nature of social media content. In this paper, we explore one such challenge and propose a text classification framework to deal with Twitter noisy data. More specifically, we employed several transformers both individually and in combination, so as to differentiate between relevant and non-relevant Twitter posts, achieving the highest F1-score of 0.87.

translated by 谷歌翻译

A Machine Learning Case Study for AI-empowered echocardiography of Intensive Care Unit Patients in low- and middle-income countries

Xochicale Miguel , Thwaites Louise , Yacoub Sophie , Pisani Luigi , Tran Huy Nhat Phung , Kerdegari Hamideh , King Andrew , Gomez Alberto

分类：机器学习

2022-12-30

We present a Machine Learning (ML) study case to illustrate the challenges of clinical translation for a real-time AI-empowered echocardiography system with data of ICU patients in LMICs. Such ML case study includes data preparation, curation and labelling from 2D Ultrasound videos of 31 ICU patients in LMICs and model selection, validation and deployment of three thinner neural networks to classify apical four-chamber view. Results of the ML heuristics showed the promising implementation, validation and application of thinner networks to classify 4CV with limited datasets. We conclude this work mentioning the need for (a) datasets to improve diversity of demographics, diseases, and (b) the need of further investigations of thinner models to be run and implemented in low-cost hardware to be clinically translated in the ICU in LMICs. The code and other resources to reproduce this work are available at https://github.com/vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries.

translated by 谷歌翻译

Learning Multimodal Data Augmentation in Feature Space

Zichang Liu , Zhiqiang Tang , Xingjian Shi , Aston Zhang , Mu Li , Anshumali Shrivastava , Andrew Gordon Wilson

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-12-29

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.

translated by 谷歌翻译

Investigating Sindy As a Tool For Causal Discovery In Time Series Signals

Andrew O'Brien , Rosina Weber , Edward Kim

分类：机器学习

2022-12-29

The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. In this paper, we argue that this makes SINDy a potentially useful tool for causal discovery and that existing tools for causal discovery can be used to dramatically improve the performance of SINDy as tool for robust sparse modeling and system identification. We then demonstrate empirically that augmenting the SINDy algorithm with tools from causal discovery can provides engineers with a tool for learning causally robust governing equations.

translated by 谷歌翻译

Behavioral Cloning via Search in Video PreTraining Latent Space

Federico Malato , Florian Leopold , Amogh Raut , Ville Hautamäki , Andrew Melnik

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-27

Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.

translated by 谷歌翻译